Converting MySQL Database Contents to UTF-8

[fr]

Voici comment j'ai converti le contenu de ma base de données WordPress en UTF-8. C'est assez simple en soi, mais ça m'a pris longtemps pour comprendre comment le faire!

[en]

A few weeks ago, I discovered (to my horror) that my site was not UTF-8, as I thought it was. Checking my pages in the validator produced this error:

The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the XML declaration (utf-8). I will use the value from the HTTP header (iso-8859-1).

In all likeliness, my server adds a default header (iso-8859-1) to the pages it serves. When I switched to WordPress, I was careful to save all my import files as UTF-8, and I honestly thought that everything I had imported into the database was UTF-8. Somewhere in the process, it got switched back to iso-8859-1 (latin-1).

The solution to make sure the pages served are indeed UTF-8, as specified in the meta tags of my HTML pages, is to add the following line to .htaccess:

AddDefaultCharset OFF

(If one wanted to force UTF-8, AddDefaultCharset UTF-8 would do it, but actually, it’s better to leave the possibility to serve pages with different encodings, isn’t it?)

Now, when I did that, of course, all the accented characters in my site went beserk — proof if it was needed that my database content was not UTF-8. Here is the story of what I went through (and it took many days to find the solution, believe me, although it takes only 2 minutes to do once everything is ready) to convert my database content from ISO-8859-1 to UTF-8. Thanks a lot to all those who helped me through this — and they are many!

First thing, dump the database. I always forget the command for dumps, so here it is:

mysqldump --opt -u root -p wordpress > wordpress.sql

As we’re going to be doing stuff, it might be wise to make a copy of the working wordpress database. I did that by creating a new database in PhpMyAdmin, and importing my freshly dumped database into it:

mysql -u root -p wordpress-backup < wordpress.sql

Then, conversion. I tried a PHP script, I tried BBEdit, and they seemed to mess up. (Though as I had other issues elsewhere, they may well have worked but I mistakenly thought the problem was coming from there.) Anyway, command-line conversion with iconv is much easier to do:

iconv -f iso-8859-15 -t utf8 wordpress.sql > wordpress-iconv.sql

Then, import into the database. I first imported it into another database, edited wp-config.php to point to the new database, and checked that everything was ok:

mysql -u root -p wordpress-utf8 < wordpress-iconv.sql

Once I was happy that it was working, I imported my converted dump into the WordPress production database:

mysql -u root -p wordpress < wordpress-iconv.sql

On the way there, I had some trouble with MySQL. The MySQL dump more or less put the content of all my weblog posts on one line. For some reason, it didn’t cause any problems when importing the dump before conversion, to create the backup database, but it didn’t play nice after conversion.

I got this error when trying to import:

ERROR 1153 at line 378: Got a packet bigger than 'max_allowed_packet'

Line 378 contained half my weblog posts… and was obviously bigger than the 1Mb limit for max_allowed_packet (the whole dump is around 2Mb).

I had to edit my.cnf (/etc/mysql/my.cnf on my system) and change the value for max_allowed_packet in the section titled [mysqld]. I set it to 8Mb. Then, I had to stop mysql and restart it: mysqladmin -u root -p shutdown to stop it, and mysqld_safe & to start it again (as root).

This is not necessarily the best way to do it, and it might not work like that on your system, but it’s what I did and the site is now back up again. Comments welcome, and hope this can be useful to others!

34 Comments

  1. Posted 8/2/2004 at 7:27 pm | Permalink

    Your post covered exactly what I set out to do. Thank you!

  2. Posted 9/5/2004 at 4:43 pm | Permalink

    FYI, you only got the max_allowed_packet error with the UTF-8 version because UTF-8 takes more space (more bytes) than ISO-8859-15.

  3. Posted 9/16/2004 at 1:11 am | Permalink

    i don’t have shell access but would love to know how to do this using phpmyadmin. any tips?

    i’ve checked phpmyadmin and iconv docs — and the answer may be there — but alas i’m quite the novice and unsure how to word the query within phpmyadmin.

    i am migrating a home-built coldfusion blog to wordpress and it’s full of special characters.

  4. Posted 11/10/2004 at 6:34 am | Permalink

    It is a very useful post, but as the previous commenter said: those people who have only phpmyadmin access to their database (i.e. the majority of the average WP users!) will just stare at it. Is there any chance to “convert” this instruction for them?
    Thanks!

  5. Posted 3/1/2005 at 1:43 pm | Permalink

    I like the part about AddDefaultCharset OFF - didn’t think about that myself. For those without shell access and without access to iconv, I suggest to look at the method I used and has described in the wp support forum, in a thread namedchange from ISO-8859-15 to UTF-8

  6. keli
    Posted 3/8/2005 at 9:23 pm | Permalink

    a few notes to those without shell access (and even on a windows machine on top of that :) )

    phpMyAdmin can dyump mysql data just fine (although it calls it “exporting”). Specify a file name and check the “save to file” option, and make sure you select all your tables.

    As for backup, it’s unnecessary to import it back. just make a copy of the dump if you really want to. This description does contain a lot of redundant saves -beyond your normal safety needs-.

    When you’re finished with conversion –see Uffe’s post above– just use phpMyAdmin’s “import file” feature to import it back.

  7. Posted 4/5/2005 at 1:38 am | Permalink

    Very good! Very helpful!
    Work finelly fixing the problem to a Portuguese (Brazilian) site.
    Thank you!

  8. Posted 4/30/2005 at 4:10 pm | Permalink

    FYI for windows users - open the doc in wordpad and save with utf-8 encoding - should cure what ails you :)

  9. Cloud
    Posted 8/10/2005 at 4:44 pm | Permalink

    Replace utf8 with utf-8 on some systems…

  10. Posted 10/28/2005 at 7:16 pm | Permalink

    Eu le même problème et ayant suivi vos pas ça a marché niquel: zéro problèmes! Je vous remercie énormement!

  11. Alonzo
    Posted 1/6/2006 at 11:23 pm | Permalink

    Thanks, this how-to was working great for me.

  12. Paolo
    Posted 1/23/2006 at 7:48 pm | Permalink

    I’ve find this article partially usefull for me but I’ve imported from a custom CMS and I’ve found better doing “iconv -f windows-1252 -t utf8 file.sql > file_utf8.sql” ’cause sometimes ppl do copy/paste from Word into the textarea.

  13. Posted 1/26/2006 at 6:45 am | Permalink

    Wow this is great! I was just pulling my hair wondering on how to convert the database :)

    Thanx for sharing the knowledge

  14. Posted 2/7/2006 at 10:32 pm | Permalink

    Thanks for that solution - worked like a charm, when I tried to import a greymatter blog written in danish.

    :o) A

  15. Posted 3/16/2006 at 9:46 am | Permalink

    It worked perfect for me. Nice tip!

  16. Posted 6/2/2006 at 6:55 am | Permalink

    Hi,

    I have same problem, can you help me convert ?
    thank !

  17. Max
    Posted 6/23/2006 at 12:17 pm | Permalink

    I think there’s an easier solution than restarting mysql (although I haven’t tested this, because I’ve never had a database big enough). Simply tell mysqldump to limit the size of the packets it creates by adding the following commandline argument:

    –set-variable=max_allowed_packet=1M

  18. Max
    Posted 6/23/2006 at 12:19 pm | Permalink

    That should have been two dashes, not an em dash. Let me try again:

    --set-variable=max_allowed_packet=1M

  19. gorkem
    Posted 8/14/2006 at 12:28 pm | Permalink

    For those using libiconv-1.9.2-1 ,new iconv file, “utf8″ does not work use “utf-8″ instead as in “iconv -f iso-8859-9 -t utf-8 wordp…..” perfectly converted iso türkish database to utf-8. Also be sure to convert your site language files (in joomla/mambo case /languages/turkish.php for example) to utf-8 by opening in textpad/notepad and saving as utf-8.

  20. gorkem
    Posted 8/14/2006 at 12:30 pm | Permalink

    add to above. using phpmyadmin is fine, but do not compress, zip version of phpmyadmin and winrar/winzip does not match, dumpt to just sql, also no need to restart mysql server (at least in localhost), just refresh your page.

  21. Egil
    Posted 8/15/2006 at 11:51 pm | Permalink

    Just a quick note - it is wise to specify the character set when exporting/dumping from mysql with mysqldump, using the –default-character-set=latin1 command. If not, and your system is defaulting to utf8 after an upgrade, you get iso-8859-15 data inside a utf8 file, which took me quite some time to find out (sigh).

  22. Posted 1/31/2007 at 7:22 pm | Permalink

    OR you can

    Executing SQL Statements from a Text File

    The mysql client typically is used interactively, like this:

    shell> mysql db_name

    However, it is also possible to put your SQL statements in a file and then tell mysql to read its input from that file. To do so, create a text file text_file that contains the statements you wish to execute. Then invoke mysql as shown here:

    shell> mysql db_name < text_file

    If you place a USE db_name statement as the first statement in the file, it is unnecessary to specify the database name on the command line:

    shell> mysql < text_file

    If you are already running mysql, you can execute an SQL script file using the source or . command:

    mysql> source file_name
    mysql> . file_name

    Sometimes you may want your script to display progress information to the user. For this you can insert statements like this:

    SELECT ” AS ‘ ‘;

    The statement shown outputs .

  23. Posted 3/1/2007 at 11:02 pm | Permalink

    This info just saved my life. I thank you. bows

  24. Posted 4/14/2007 at 11:06 am | Permalink

    thanks you soo much for you documentation on how to convert the charset, it saved me a lot of time .. grüsse

  25. Posted 5/20/2007 at 8:49 pm | Permalink

    Thank you for this post, especially the iconv bit! It worked perfectly!

  26. gomba
    Posted 5/31/2007 at 2:23 pm | Permalink

    I had problem converting a joomla ISO DB to UTF8 using the iconv -f iso-8859-15 -t utf8…
    It didn’t convert german special caracters.

    Then I used iconv -f CP1252 -t utf8 and it worked perfecly….

  27. strawks
    Posted 7/13/2007 at 2:08 pm | Permalink

    I had to convert a database from iso latin1 to UTF-8, but I’m not an expert in encodings.
    What I found out is that MySQL obviously uses the character set you specify when creating a table to store the data and makes conversions depending of the client.
    If the DB is stored using UTF-8 and the client specify latin1 (using the SET NAMES ‘latin1′ statement) after connecting, MySQL will convert all data to latin1 before sending the results to the client.

    My need was to convert a database stored in latin1 to UTF-8, so I simply used the following commands (a quick hack) and MySQL would do the rest :

    mysqldump –opt -u db_user -p db_name | sed “s/CHARSET=latin1/CHARSET=utf8/” > db.dump.sql

    mysql -u db_user -p -D db_name < db.dump.sql

    As suggested by Egil, specifying the charset to these commands would be wise.

    The default charset of my server is latin1 so db.dump.sql is encoded in latin1, sed replaces all default charset for each table from latin1 to UTF-8 and the second mysql will parse the file using the latin1 charset, send the query to the server in latin1 and the server will convert and store all the data in UTF-8.

  28. Posted 1/25/2008 at 8:15 am | Permalink

    Here is a technique I found that allows you to do the conversion to unicode through phpMyAdmin. I have tried this personally with a WP site/database and it worked for me on a site that has both English and Japanese text. This approach becomes really tedious if you have a lot of database tables though.

    Making your Wordpress database portable:
    http://jonkenpon.com/2007/02/20/making-your-wordpress-database-portable-because-it-probably-isnt-right-now/

  29. Posted 1/25/2008 at 8:48 am | Permalink

    I found another link that talks about your method and ads a little more details. People might find this helpful too:

    http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL

  30. Posted 3/9/2008 at 9:48 pm | Permalink

    AFAIK, this could be done much simpler:

    ALTER TABLE table_name CHARSET ‘utf8′;

    This is what official MySQL documentation says: “If you want to change all CHAR/VARCHAR/TEXT columns to a new character set you can do: ALTER TABLE table_name CHARACTER SET character_set_name;”

  31. Posted 3/10/2008 at 12:41 pm | Permalink

    I’m afraid that does not convert the actual content. It won’t work in this case. Trust me, I’d have done it, otherwise ;-)

  32. Posted 3/13/2008 at 3:50 pm | Permalink

    Sorry, I have mistaken with the statement: it should be

    ALTER TABLE table_name CONVERT TO CHARACTER SET ‘utf8′;

    I have recently faced the same problem and that statement did really help!

    You can read about it here: http://dev.mysql.com/doc/refman/5.0/en/alter-table.html

    Believe me, it works :-)

  33. Posted 3/20/2008 at 3:55 am | Permalink

    Damn… I don’t get anything.
    Seems, it DOES work.

    Where to write this? mysqldump –opt -u root -p wordpress > wordpress.sql

    No idea where to start. Please help me guys to fix this old Mongolian DB!

  34. Posted 3/22/2008 at 7:56 pm | Permalink

    Where to write this?

    Frank, you will need to log into the server with your SSH client (provided you have SSH access enabled). And in the command line you type those commands.

    Alternatively, you can probably do this with the help of PHP Shell.

    Please feel free to contact me if you need help: sjinks (at) sjinks (dot) org (dot) ua

    Regards,
    Vladimir

12 Trackbacks

  1. By Climb to the Stars (Stephanie Booth) on 2/19/2005 at 7:09 pm

    Problèmes d’encodage MySQL

    Un joli mélange de latin-1 et d’utf-8 dans ma base de données. Un script python pour nettoyer tout ça….

  2. By 星☆空 on 2/21/2005 at 10:19 pm

    Linux 下将 MySQL 编�转�至 UTF-8 的实例

    Ping Back�自:blog.csdn.net

  3. [...] log brukte iso-8859… sॠmॠta en runde og konvertere db’n til utf-8. Fant en blog med info pॠdet, sॠdet blir om ei lenge..
    [...]

  4. By Le monde selon waffi » Techno bla-bla on 11/10/2005 at 7:49 am

    [...] Pour ceux qui cherchent à  se débarasser une fois pour toute de l’encodage ISO-8859-1 pour s’amuser avec les vertus d’Unicode(TM) UTF-8, je vous suggère ce guide. Ça marché numéro un! [...]

  5. [...] Ganz einfach. Bei Climb to the Stars gibt es eine sehr schöne Anleitung zur “Konvertierung von ISO-8859-1x nach UTF-8″ die in meinem Fall wunderbar weitergeholfen hat. [...]

  6. [...] Linki: WP Support post Oryginalna informacja jak to zrobić [...]

  7. [...] Erstes Problem: die Zeichencodierung Wordpress liefert seine Seiten UTF-8-codiert aus und die importierten Datensätze in MySQL haben ISO-8859-1 als Format. Was macht man da? Google fragen und das hier finden. Kurz zusammengefasst habe ich folgendes gemacht: [...]

  8. By Weblogger.ch » Passage à UTF-8 on 9/7/2006 at 10:16 pm

    [...] Yeah, il était temps, merci a climbtothestarts pour le tuyau, en gros il suffit d’avoir un accès SSH et taper ce qui suit: [...]

  9. [...] También hemos cambiado el encoding de ISO-8859-1 a UTF-8. Esto ha sido bastante sencillo siguiendo estas instrucciones. Básicamente se trata de usar la utilidad iconv sobre los volcados de la base de datos y los ficheros del theme que estéis usando. Si notáis que algún carácter sale raro en alguna de las páginas por favor avisad para que podamos corregirlo. añadir a: - [...]

  10. [...] Climb to the Stars (Stephanie Booth) » Converting MySQL Database Contents to UTF-8 [...]

  11. By Climb to the Stars (Stephanie Booth) on 2/3/2007 at 11:53 pm

    Invalid argument supplied for foreach() in wp-capabilities.php: Case Cracked!…

    Finally. At last. Endlich. Enfin.

    Once more, while trying to transfer a WordPress installation from one server to another, I found myself facing the dreaded problem which locks me out of my WordPress install with a rather cryptic message:

    Warning: In…

  12. [...] the fact that some of my blog posts have already been around the world three times (my stuff on MySQL encoding problems and multiple WordPress installations have remained popular for years — the latter with [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*
« Back to text comment
Contact