Converting MySQL Database Contents to UTF-8

[fr] Voici comment j'ai converti le contenu de ma base de données WordPress en UTF-8. C'est assez simple en soi, mais ça m'a pris longtemps pour comprendre comment le faire!

A few weeks ago, I discovered (to my horror) that my site was not UTF-8, as I thought it was. Checking my pages in the validator produced this error:

The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the XML declaration (utf-8). I will use the value from the HTTP header (iso-8859-1).

In all likeliness, my server adds a default header (iso-8859-1) to the pages it serves. When I switched to WordPress, I was careful to save all my import files as UTF-8, and I honestly thought that everything I had imported into the database was UTF-8. Somewhere in the process, it got switched back to iso-8859-1 (latin-1).

The solution to make sure the pages served are indeed UTF-8, as specified in the meta tags of my HTML pages, is to add the following line to .htaccess:

AddDefaultCharset OFF

(If one wanted to force UTF-8, AddDefaultCharset UTF-8 would do it, but actually, it’s better to leave the possibility to serve pages with different encodings, isn’t it?)

Now, when I did that, of course, all the accented characters in my site went beserk — proof if it was needed that my database content was not UTF-8. Here is the story of what I went through (and it took many days to find the solution, believe me, although it takes only 2 minutes to do once everything is ready) to convert my database content from ISO-8859-1 to UTF-8. Thanks a lot to all those who helped me through this — and they are many!

First thing, dump the database. I always forget the command for dumps, so here it is:

mysqldump --opt -u root -p wordpress > wordpress.sql

As we’re going to be doing stuff, it might be wise to make a copy of the working wordpress database. I did that by creating a new database in PhpMyAdmin, and importing my freshly dumped database into it:

mysql -u root -p wordpress-backup < wordpress.sql

Then, conversion. I tried a PHP script, I tried BBEdit, and they seemed to mess up. (Though as I had other issues elsewhere, they may well have worked but I mistakenly thought the problem was coming from there.) Anyway, command-line conversion with iconv is much easier to do:

iconv -f iso-8859-15 -t utf8 wordpress.sql > wordpress-iconv.sql

Then, import into the database. I first imported it into another database, edited wp-config.php to point to the new database, and checked that everything was ok:

mysql -u root -p wordpress-utf8 < wordpress-iconv.sql

Once I was happy that it was working, I imported my converted dump into the WordPress production database:

mysql -u root -p wordpress < wordpress-iconv.sql

On the way there, I had some trouble with MySQL. The MySQL dump more or less put the content of all my weblog posts on one line. For some reason, it didn’t cause any problems when importing the dump before conversion, to create the backup database, but it didn’t play nice after conversion.

I got this error when trying to import:

ERROR 1153 at line 378: Got a packet bigger than 'max_allowed_packet'

Line 378 contained half my weblog posts… and was obviously bigger than the 1Mb limit for max_allowed_packet (the whole dump is around 2Mb).

I had to edit my.cnf (/etc/mysql/my.cnf on my system) and change the value for max_allowed_packet in the section titled [mysqld]. I set it to 8Mb. Then, I had to stop mysql and restart it: mysqladmin -u root -p shutdown to stop it, and mysqld_safe & to start it again (as root).

This is not necessarily the best way to do it, and it might not work like that on your system, but it’s what I did and the site is now back up again. Comments welcome, and hope this can be useful to others!

Similar Posts:

This entry was posted in Wordpress and tagged command line, convert, database, encoding, error, iso-8859-1, latin-1, max_allowed_packet, mysql, Real Live Code, restart, utf-8, Weblog Technology, Wordpress. Bookmark the permalink.

64 Responses to Converting MySQL Database Contents to UTF-8

  1. fatalfuj says:

    Your post covered exactly what I set out to do. Thank you!

  2. scott reynen says:

    FYI, you only got the max_allowed_packet error with the UTF-8 version because UTF-8 takes more space (more bytes) than ISO-8859-15.

  3. doug says:

    i don’t have shell access but would love to know how to do this using phpmyadmin. any tips?

    i’ve checked phpmyadmin and iconv docs — and the answer may be there — but alas i’m quite the novice and unsure how to word the query within phpmyadmin.

    i am migrating a home-built coldfusion blog to wordpress and it’s full of special characters.

  4. doug says:

    i don't have shell access but would love to know how to do this using phpmyadmin. any tips?

    i've checked phpmyadmin and iconv docs — and the answer may be there — but alas i'm quite the novice and unsure how to word the query within phpmyadmin.

    i am migrating a home-built coldfusion blog to wordpress and it's full of special characters.

  5. Moshu says:

    It is a very useful post, but as the previous commenter said: those people who have only phpmyadmin access to their database (i.e. the majority of the average WP users!) will just stare at it. Is there any chance to “convert” this instruction for them? Thanks!

  6. Pingback: Climb to the Stars (Stephanie Booth)

  7. Problèmes d'encodage MySQL

    Un joli mélange de latin-1 et d'utf-8 dans ma base de données. Un script python pour nettoyer tout ça….

  8. Pingback: 星☆空

  9. Uffe says:

    I like the part about AddDefaultCharset OFF – didn’t think about that myself. For those without shell access and without access to iconv, I suggest to look at the method I used and has described in the wp support forum, in a thread namedchange from ISO-8859-15 to UTF-8

  10. Pingback: Blog’n » Blog Archive » Blog oppdatering

  11. keli says:

    a few notes to those without shell access (and even on a windows machine on top of that :) )

    phpMyAdmin can dyump mysql data just fine (although it calls it “exporting”). Specify a file name and check the “save to file” option, and make sure you select all your tables.

    As for backup, it’s unnecessary to import it back. just make a copy of the dump if you really want to. This description does contain a lot of redundant saves -beyond your normal safety needs-.

    When you’re finished with conversion –see Uffe’s post above– just use phpMyAdmin’s “import file” feature to import it back.

  12. Diego says:

    Very good! Very helpful! Work finelly fixing the problem to a Portuguese (Brazilian) site. Thank you!

  13. streaky says:

    FYI for windows users – open the doc in wordpad and save with utf-8 encoding – should cure what ails you :)

  14. Cloud says:

    Replace utf8 with utf-8 on some systems…

  15. marc says:

    Eu le même problème et ayant suivi vos pas ça a marché niquel: zéro problèmes! Je vous remercie énormement!

  16. Pingback: Le monde selon waffi » Techno bla-bla

  17. Alonzo says:

    Thanks, this how-to was working great for me.

  18. Paolo says:

    I’ve find this article partially usefull for me but I’ve imported from a custom CMS and I’ve found better doing “iconv -f windows-1252 -t utf8 file.sql > file_utf8.sql” ’cause sometimes ppl do copy/paste from Word into the textarea.

  19. Paolo says:

    I've find this article partially usefull for me but I've imported from a custom CMS and I've found better doing “iconv -f windows-1252 -t utf8 file.sql > file_utf8.sql” 'cause sometimes ppl do copy/paste from Word into the textarea.

  20. Marcus says:

    Wow this is great! I was just pulling my hair wondering on how to convert the database :)

    Thanx for sharing the knowledge

  21. Thanks for that solution – worked like a charm, when I tried to import a greymatter blog written in danish.

    :o ) A

  22. Pingback: Bis einer weint… » Blog Archive » Konvertieren auf Zeichencodierung UTF-8

  23. sogas says:

    It worked perfect for me. Nice tip!

  24. Pingback: Slawek’s Hideout » Wordpress ISO-8859-2 > UTF-8

  25. Pingback: Stefan Macke » Blog Archive » Umstieg von b2evolution auf Wordpress

  26. xuan says:

    Hi,

    I have same problem, can you help me convert ? thank !

  27. Max says:

    I think there’s an easier solution than restarting mysql (although I haven’t tested this, because I’ve never had a database big enough). Simply tell mysqldump to limit the size of the packets it creates by adding the following commandline argument:

    –set-variable=max_allowed_packet=1M

  28. Max says:

    That should have been two dashes, not an em dash. Let me try again:

    --set-variable=max_allowed_packet=1M

  29. gorkem says:

    For those using libiconv-1.9.2-1 ,new iconv file, “utf8″ does not work use “utf-8″ instead as in “iconv -f iso-8859-9 -t utf-8 wordp…..” perfectly converted iso türkish database to utf-8. Also be sure to convert your site language files (in joomla/mambo case /languages/turkish.php for example) to utf-8 by opening in textpad/notepad and saving as utf-8.

  30. gorkem says:

    add to above. using phpmyadmin is fine, but do not compress, zip version of phpmyadmin and winrar/winzip does not match, dumpt to just sql, also no need to restart mysql server (at least in localhost), just refresh your page.

  31. Egil says:

    Just a quick note – it is wise to specify the character set when exporting/dumping from mysql with mysqldump, using the –default-character-set=latin1 command. If not, and your system is defaulting to utf8 after an upgrade, you get iso-8859-15 data inside a utf8 file, which took me quite some time to find out (sigh).

  32. Pingback: Weblogger.ch » Passage à UTF-8

  33. Pingback: Actualización de WordPress y cambio de encoding - Trabber Blog

  34. Pingback: geheimwerk.de » Blog Archiv » Encoding-Wirrwarr

  35. eWebmaster says:

    OR you can

    Executing SQL Statements from a Text File

    The mysql client typically is used interactively, like this:

    shell> mysql db_name

    However, it is also possible to put your SQL statements in a file and then tell mysql to read its input from that file. To do so, create a text file text_file that contains the statements you wish to execute. Then invoke mysql as shown here:

    shell> mysql db_name < text_file

    If you place a USE db_name statement as the first statement in the file, it is unnecessary to specify the database name on the command line:

    shell> mysql < text_file

    If you are already running mysql, you can execute an SQL script file using the source or . command:

    mysql> source file_name mysql> . file_name

    Sometimes you may want your script to display progress information to the user. For this you can insert statements like this:

    SELECT ” AS ‘ ‘;

    The statement shown outputs .

  36. Pingback: Climb to the Stars (Stephanie Booth)

  37. Invalid argument supplied for foreach() in wp-capabilities.php: Case Cracked!…


    Finally. At last. Endlich. Enfin.


    Once more, while trying to transfer a WordPress installation from one server to another, I found myself facing the dreaded problem which locks me out of my WordPress install with a rather cryptic message:


    Warning: In…

  38. John says:

    This info just saved my life. I thank you. bows

  39. wal-d says:

    thanks you soo much for you documentation on how to convert the charset, it saved me a lot of time .. grüsse

  40. Marek says:

    Thank you for this post, especially the iconv bit! It worked perfectly!

  41. gomba says:

    I had problem converting a joomla ISO DB to UTF8 using the iconv -f iso-8859-15 -t utf8… It didn’t convert german special caracters.

    Then I used iconv -f CP1252 -t utf8 and it worked perfecly….

  42. strawks says:

    I had to convert a database from iso latin1 to UTF-8, but I’m not an expert in encodings. What I found out is that MySQL obviously uses the character set you specify when creating a table to store the data and makes conversions depending of the client. If the DB is stored using UTF-8 and the client specify latin1 (using the SET NAMES ‘latin1′ statement) after connecting, MySQL will convert all data to latin1 before sending the results to the client.

    My need was to convert a database stored in latin1 to UTF-8, so I simply used the following commands (a quick hack) and MySQL would do the rest :

    mysqldump –opt -u db_user -p db_name | sed “s/CHARSET=latin1/CHARSET=utf8/” > db.dump.sql

    mysql -u db_user -p -D db_name < db.dump.sql

    As suggested by Egil, specifying the charset to these commands would be wise.

    The default charset of my server is latin1 so db.dump.sql is encoded in latin1, sed replaces all default charset for each table from latin1 to UTF-8 and the second mysql will parse the file using the latin1 charset, send the query to the server in latin1 and the server will convert and store all the data in UTF-8.

  43. Pingback: Climb to the Stars (Stephanie Booth) » Going Solo Venues, Open Stage, and Link Love

  44. Christophe says:

    Here is a technique I found that allows you to do the conversion to unicode through phpMyAdmin. I have tried this personally with a WP site/database and it worked for me on a site that has both English and Japanese text. This approach becomes really tedious if you have a lot of database tables though.

    Making your WordPress database portable: http://jonkenpon.com/2007/02/20/making-your-wordpress-database-portable-because-it-probably-isnt-right-now/

  45. Christophe says:

    I found another link that talks about your method and ads a little more details. People might find this helpful too:

    http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL

  46. AFAIK, this could be done much simpler:

    ALTER TABLE table_name CHARSET ‘utf8′;

    This is what official MySQL documentation says: “If you want to change all CHAR/VARCHAR/TEXT columns to a new character set you can do: ALTER TABLE table_name CHARACTER SET character_set_name;”

  47. Stephanie says:

    I’m afraid that does not convert the actual content. It won’t work in this case. Trust me, I’d have done it, otherwise ;-)

  48. Stephanie says:

    I'm afraid that does not convert the actual content. It won't work in this case. Trust me, I'd have done it, otherwise ;-)

  49. Sorry, I have mistaken with the statement: it should be

    ALTER TABLE table_name CONVERT TO CHARACTER SET ‘utf8′;

    I have recently faced the same problem and that statement did really help!

    You can read about it here: http://dev.mysql.com/doc/refman/5.0/en/alter-table.html

    Believe me, it works :-)

  50. Frank says:

    Damn… I don’t get anything. Seems, it DOES work.

    Where to write this? mysqldump –opt -u root -p wordpress > wordpress.sql

    No idea where to start. Please help me guys to fix this old Mongolian DB!

  51. Where to write this?

    Frank, you will need to log into the server with your SSH client (provided you have SSH access enabled). And in the command line you type those commands.

    Alternatively, you can probably do this with the help of PHP Shell.

    Please feel free to contact me if you need help: sjinks (at) sjinks (dot) org (dot) ua

    Regards, Vladimir

  52. DrTebi says:

    Hello there,

    unfortunately, it is not enough to add the –max_allowed_packet option to your mysql import statement. While this will increase the allowed packed size the client can send, it will not change the allowed packet size the mysql server. See here:
    http://lists.mysql.com/mysql/198070

    So, if your dump is very big, you will have to edit my.cnf to increase the packet size there–at least temporarily. Don't forget to restart after editing my.cnf

    DrTebi

  53. Bimal says:

    I am not sure what happens if I do the below:
    Modidfy the original database tables accpet utf-8 over the running data.
    (Note that taking a backup is always a safe way to start).

    alter table wordpress_table_1
    change latin_column utf_column text character set utf8 collate utf8_bin NOT NULL;

    If this works, you do not need to make another database restoration.

  54. Oliver says:

    Great! I was having the same host header error on the validator. I had actually forgotten where the host header came from but your post cleared it up for me.

    -Thanks

  55. Pingback: MySQL + UTF-8 = Not So Obvious « Ward’s Words

  56. Srinivas Nalla says:

    This a one of Important Site I am looking for. It helped me alot to bring the data of iso88591 to utf8. Thanks to the initiator and for all the pains underwent to prepare this doc

  57. Pingback: Pasando tablas de latin1 a UTF8

  58. Pingback: ich habe utf8 gemacht. › zeugs.

  59. Eroan says:

    Thanks for these mysql tips, I’ve been looking for a solution with UTF8 for more tnah 2 days… I thought there was no solution :/

  60. Ziad Rahhal says:

    Hi,

    @Bimal: be careful from Foreign Keys when you do this: alter table wordpress_table_1 change latin_column utf_column text character set utf8 collate utf8_bin NOT NULL;

    I tried something similar like: alter table convert to character set utf8 collate utf8_general_ci; and it doesn’t work at all if you have foreign keys in the tables in question. You need to drop foreign keys first. The error is (lets say your table is called ): ERROR 1025 (HY000): Error on rename of ‘./stockUTF8/#sql-3ea_17′ to ‘./stockUTF8/items’ (errno: 150)

    And probably the method that Stephanie supplied won’t work with Foreign Keys. I am not sure, I need to test it with my tables that have foreign keys set.

  61. Ziad Rahhal says:

    I correct my previous post: the method that Stephanie used works fine even with Foreign Keys. However, iconv didn’t convert from latin1(iso-8859-1) to utf8 in my case. I haven’t investigated iconv yet. What I did is manually change every occurrence of “latin-1″ to “utf8″ in the .sql file.

  62. yash says:

    this wordpress plugin (http://www.sociable.es/facebook-wordpress-plugin-3-0/) creating Unicode bug

    when i submit my blog link to facebook it display like odd charecters

    “¤œ¤¾¤‡¤¤¤œ¤¾¤‡¤¤…”

    the original text is “डिजाइनदार राखियों की…”

    and when i uninstall this plugin, everything working fine

    my blog: http://www.shreshthbharat.in (please try to post a link on facebook you’ll see odd character in preview )

    there are some tables are in latin1_swedish_ci i change it to “utf8_unicode_ci” but still problem is there

    i donot want to leave this powerful plugin

    please tell me how do i fix this bug

  63. Make sure you’re not confusing collation with the actual encoding the content is in.

  64. SIFE says:

    With out dumping the database, and then restore it back: $ mysql -u root -p password: mysql> ALTER DATABASE wordpress-iconv CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>